A Compression Technique for Arabic Dictionaries: The Affix Analysis
نویسنده
چکیده
In every application that concerns the automatic processing of natural language, the problem of the dictionary size is posed. In this paper , we propose a compression dictionary al~orithm based on an affix analysis of the non diacritical Arabic. It consists in decomposing a word into its first elements taking into account the different linguistic transformations that can affect the morphological structures. This work has been achieved as part of a study of the automatic detection and correction of spelling errors in the non diacritical Arabic texts. IINTRODUCTION In every application that concerns the automatic processing of natural language, the problem of the dictionary size is posed. We can approach this important question in several ways and particularly : By grouping together the common prefixes of the different language words. In the PIAF system,(interactive program for French Analysis) for instance, words are represented in chained lists following an alphabetical order [COUR 77] EX : PARTIEL ~ PARTIES_____--~_PARTOUT ...
منابع مشابه
Morphological Analysis and Diacritical Arabic Text Compression
Morphological analysis of Arabic words allows decreasing the storage requirements of the Arabic dictionaries, more efficient encoding of diacritical Arabic text, faster spelling and efficient Optical character recognition. All these factors allow efficient storage and archival of multilingual digital libraries that include Arabic texts. This paper presents a lossless compression algorithm based...
متن کاملGenetic Algorithms in Syllable-Based Text Compression
Syllable based text compression is a new approach to compression by symbols. In this concept syllables are used as the compression symbols instead of the more common characters or words. This new technique has proven itself worthy especially on short to middle-length text files. The effectiveness of the compression is greatly affected by the quality of dictionaries of syllables characteristic f...
متن کامل“Uncommon terminations”: Proscription and morphological productivity
Discussions of the standardization of English vocabulary are seldom taken up with questions of morphology. Yet there is a history of, often strikingly similar, attempts to influence the use of particular word-formation processes, such as the proscription of individual lexical items on morphological grounds, or more precisely, the grounds that an affix is being “overextended”. This is not a refe...
متن کاملHermit Crabs: Formal Renewal of Morphology by Phonologically Mediated Affix Substitution
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].. Linguistic Society of America is collaborating with JSTOR...
متن کاملLogic Compression Of Dictionaries For Multilingual Spelling Checkers
To provide practical spelling checkers on micro-computers, good compression algorithms ,'~'c essenlial. CutTeut techniques used to compress lexicons for indo-Fmropean languages provide efficient spelling checker. Applying the .~une methods to languages which have a different morphological system (Arabic, Turkish,...) gives insufficient resuits. To get better results, we apply other "logical" co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1986